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The self-consistent procedure in electronic structure calculations is revisited using a new highly efficient and 
robust algorithm for solving the non-linear eigenvector problem i.e. H({tp})tp = Exp. This new scheme is 
derived from a generalization of the FEAST eigenvalue algorithm to account for the non-linearity of the 
Hamiltonian with the occupied eigenvectors. Using a series of numerical examples and the DFT-Kohn/Sham 
model, it will be shown that our approach can outperform the traditional SCF mixing-scheme techniques by 
providing a high converge rate, convergence to the correct solution regardless of the choice of the initial guess, 
and a significant reduction of the eigenvalue solve time in simulations. 
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I. INTRODUCTION 

Although first-principle calculations in general, and 
DFT in particular, have provided a practical (i.e. nu- 
merical tractable) path for solving the electronic struc- 
ture problem, they have introduced new numerical chal- 
lenges on their own. Within the single electron picture, 
the Hamiltonian operator depends on the occupied eigen- 
functions and the resulting eigenvalue problem becomes 
fully non-linear (i.e. H{{tp})ip = Etp), In practice, this 
non-linear eigenvector problem is commonly addressed 
using a self-consistent field method (SCF) wherein a se- 
ries of linear eigenvalue problems (i.e. Hip = Eip), needs 
to be solved iteratively until convergence^. Successfully 
reaching convergence by performing SCF iterations is of 
paramount importance to first-principle quantum chem- 
istry and solid-state physics simulations software. A typ- 
ical self-consistent iteration procedure for the discretized 
DFT/Kohn-Sham problem is represented in Figure [TJ 

Traditional SCF mixing methods employ successive ap- 
proximation iterates of a fixed point mapping to generate 
the new input electron density at each cycle. Examples 
of such methods include Newton-Broyden, or other An- 
derson technique, and Pulay mixing techniques using di- 
rect inversion of the iterative subspace (DIIS)2r— . Using 
DIIS, the new input electron density at iteration k, is 
generated from a linear combination of a series of pre- 
vious trial densities (i.e. {n^ fc_1 \ n( k ~ 2 \ n^ 0-3 ) , . . . }) by 
minimizing the successive residual errors between input 
and output densities. 

Three main numerical difficulties arise from standard 
SCF procedures: (i) the linear eigenvalue problem needs 
to be solved repeatedly a large number of times; (ii) the 
robustness of the self-consistent iterations is very sen- 
sitive to the choice of the initial guess; and (iii) there 
is, as of yet, no robust and efficient general-purpose it- 
erative numerical scheme for addressing the non-linear 
coupling with guaranteed convergence. Although much 
progress has been made to improve the converge rate of 
SCF techniques^—, the iterations can still be found to 
converge very slowly or unreliably^. 

In this paper, we present an efficient alternative to tra- 



Basic SCF iteration procedure 



Input: number of occupied states M 

1- Initialization 

Derive an initial guess for the electron density n 

2- Hamiltonian Construction H[n] 

requires also solving the Poisson equation 

3- Linear Eigenvalue Problem Hx = SSx 

solve for the M lowest eigenpairs ({-Em,x m }) 

4- Compute electron density 

M 

n = 2 |x m | 2 ; check convergence of n 

m=l 

continue iterations if needed 

5- Generate new input density for next iteration 

Use mixing approach to generate new density, 
go back to step 2 

Output: Ground state electron density n 

FIG. 1. Basic self-consistent procedure for solving the (dis- 
cretized) non-linear eigenvector problem H[n]x = _ESx with 
H real symmetric and S symmetric positive definite (S ^ I us- 
ing non-orthogonal basis functions), and obtaining the ground 
state electron density n = 2j^ m |x m | 2 (with a factor 2 for 
spin). For the DFT/Kohn-Sham problem, the Hamiltonian 
H[n] = -A + V H [n] + V X c[n] + V ext , is composed of 
the Hartree potential Vh (solution of Poisson equation), the 
exchange-correlation potential Vxc, and other external po- 
tential Vext including the ionic potential. 



ditional SCF iteration techniques that is ideally suited to 
address the aforementioned numerical difficulties when 
solving the discretized non-linear eigenvector problem 
H[n]x = ESx. The SCF problem is fully revisited us- 
ing a general-purpose numerical strategy derived from a 
modification of the FEAST eigenvalue algorithmic. We 
show that the new approach, named NLFEAST (for Non- 
Linear FEAST), is able to outperform the traditional 
SCF mixing-scheme techniques. NLFEAST exhibits a 
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high converge rate, converges to the correct solution re- 
gardless of the choice of the initial guess, and significantly 
reduces the eigenvalue solve time in simulations. 

The outline of this paper is as follows: In section |TT] 
we briefly summarize the numerical steps and the main 
properties of the FEAST algorithm presented in Ref. [3 
for solving the linear eigenvalue problem. In section IIIII 
we present the generalization of FEAST for solving the 
non-linear eigenvector problem. Numerical results and 
capabilities of the new NLFEAST approach are then pre- 
sented and discussed in Section ITVl 



II. THE FEAST ALGORITHM 

Within the SCF procedure in Figure [TJ solving the 
linear and symmetric eigenvalue problem at a given it- 
eration step becomes the most time-consuming part of 
the electronic structure calculations. In such compu- 
tations, one can identify two main challenges: (i) dis- 
cretization techniques that accommodate atomistic sys- 
tems with a high level of accuracy can lead to large size 
system matrices {H, S}, and (ii) the number of eigen- 
pairs needed to compute the electron density is propor- 
tional to the number of atoms in the system. In order to 
characterize large-scale nanostructures and complex sys- 
tems of current technological interest, many thousands 
of eigenpairs are indeed needed. In this regard, progress 
in large-scale electronic structure calculations can be tied 
together with advances in numerical algorithms for ad- 
dressing the eigenvalue problem. 

The recent FEAST algorithm 14 , which uniquely com- 
bines accuracy, robustness, high-performance and (lin- 
ear) parallel scalability, is ideally suited for addressing 
the electronic structure problem. FEAST is a general 
algorithm that can be used for solving the linear eigen- 
value problem to obtain all the eigenvalues and eigenvec- 
tors within a given search interval (e.g. [-Emm, £ m ax])- 
FEAST's main computational tasks consist of solving a 
small number of independent linear systems with mul- 
tiple right-hand sides along a complex contour and one 
reduced dense eigenvalue problem that is orders of mag- 
nitude smaller than the original one (the size of this re- 
duced problem is of the order of the number of eigenpairs 
inside the search interval). A full and detailed accounting 
of the implementation of the FEAST algorithm is given 
in Ref. If 4 . and the basic procedure is briefly summarized 
in Figure [2] 

Worthy of particular note are Steps 2 and 4(c) which, 
combined, generate a subspace Q that spans the eigen- 
vectors solutions in the user-defined interval. This is done 
by multiplying a given approximate eigenvector solution 
at iteration k by the density matrix p i.e. 

Q fe+1 = pSX k , (I) 

where the new subspace Q k+1 is then obtained 
at the FEAST iteration k + 1. Denoting by 



Basic FEAST Algorithm 

Input: interval [Emin, Emax], and an (over)-estimation 
Mo of the number of eigenpairs 

1- Initialization 

Select M > M random vectors Y G R NxM ° 

2- Contour Integration 

Compute Q NxMq = / dZ G ( Z ) Y ivxAf > 

3- Rayleigh-Ritz 

Form Hq MqxMo = Q T HQ and Sq MqXMo = Q T SQ 

Solve the reduced eigenvalue problem Hq$ = eSq# 

4- Subspace Iteration 

(a) Set E m = e m and X NxMq = Q NxUo & MqXMq 

(b) Check convergence (i.e. Trace{_E m }) 

(c) If needed go back to step 2 with Y = SX 
Output: All the M < Mo eigenpairs ({_E m ,x m }). 



FIG. 2. Basic FEAST procedure for solving the gener- 
alized eigenvalue problem Hx = ESx of size TV with H 
real symmetric and S symmetric positive definite (spd), 
and obtaining all the M eigenpairs within a given interval 
[Emin, Umax]- The density matrix appears implicitly in Step- 
2, using the complex contour integration of the Green's func- 
tion G(Z) = (ZS — H)~ . In practice, the vectors Q in Step- 
2 can be computed using a high-order numerical integration 
such as Gauss-Legendre quadrature, where only a small num- 
ber of linear systems, (Z?S — H)Q e = Y, need to be solved, 
one for each of a number of specific Gauss nodes Z e (asso- 
ciated with the weights tj e ) along a complex contour C, i.e. 

Q := $^Z e gC ^eQe- 



X NxA/ = {xi, x 2 , . . . , xjf} the M eigenvectors within the 
search interval, we note that the density matrix is for- 
mally given by p = XX T . In practice, the spectral pro- 
jection ([!]) is conveniently obtained by numerical integra- 
tion of a set of solutions of independent linear systems 
defined along a complex contour. It can be shown that 
FEAST consists of a reformulation of the subspace it- 
eration technique for spectral projectors 1 ^, which leads 
to an extremely robust and accurate numerical pro- 
cedure. By this means the linear FEAST algorithm 
achieves very high convergence rate of the eigenvectors 
subspace, and allows the non-linear extension of FEAST 
(i.e. NLFEAST) to achieve the kind of performance that 
will be demonstrated in this paper. 

The FEAST algorithm holds all the following impor- 
tant intrinsic properties: 

• using a high-order Gauss-Legendre quadrature, 8 to 
16 contour points suffice for FEAST to consistently 
converge in ^3 iterations to obtain up to thousands 
of eigenpairs with machine accuracy; 

• all multiplicities are naturally captured; 
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• no (explicit) orthogonalization procedure is re- 
quired; 

• pre-computed subspace can be reused as suitable 
initial guess (i.e. Step-1 in Figure [2]) for solving 
a series of eigenvalue problems that are close one 
another. One can ideally take advantage of this 
feature for addressing, in particular: (i) Step-3 of 
the SCF procedure in Figure [T] along the iterations, 
(ii) bandstructure calculations^ along the k space, 
or (iii) time-dependent propagation using spectral 
decomposition along the time-domain^; 

• the inner linear system at each contour point can 
be solved using either direct or iterative methods; 

• the algorithm can be extended efficiently for solv- 
ing non-Hermitian problems (e.g. for performing 
complex bandstructure calculations^); 

• efficient parallel implementations can be naturally 
addressed at three different levels: (i) many search 
intervals can be run independently (no overlap), 
(ii) each linear system can be solved independently 
along the complex contour (e.g. simultaneously on 
different compute nodes), and (iii) the linear sys- 
tems can be solved in parallel (the multiple right 
sides can be parallelized as well). Since the search 
intervals can be arbitrarily narrowed within a par- 
allel environment (i.e. increasing the ratio N/Mq), 
the algorithm complexity is directly dependent on 
solving a single linear system of size N with Mo 
right-hand-sides (i.e. (Z e S — H)Q e = Y in Figure 

B). 

III. DIRECT SOLUTION OF THE NON-LINEAR 
EIGENVECTOR PROBLEM 

Here, we propose to further extend the capabilities of 
the FEAST algorithm in order to solve nonlinear eigen- 
vector problems of the form 

H[n]x, = E^, n = 2]T|x 4 | 2 , (2) 

i 

so that it may be used as a direct and efficient al- 
ternative to SCF methods conventionally based on a 
"Schrodinger-Poisson" iterative procedure with density 
mixing schemes. 

Recently Yang et alJ£ have proposed an alternative 
approach to traditional SCF iterations using a direct con- 
strained minimization (DCM) algorithm for solving the 
nonlinear problem @. DCM consists of constructing and 
updating new search directions for the eigenvector sub- 
space by solving a much smaller optimization problem 
which takes the form of a reduced nonlinear eigenvec- 
tor problem. These results have motivated this current 
work, since a reduced eigenvalue problem can also be 
obtained from the FEAST algorithm, which also pro- 
vides an optimal framework for performing subspace it- 
erations. Indeed, once the search subspace Q is obtained 



by the contour integration in Step-2 of Figure [2] (from a 
given Hamiltonian H[n]), the resulting reduced problem 
in Step-3 can be expressed in a non-linear form i.e. 

H Q [n]* = eS Q *, with n = 2^|Q* l | 2 , (3) 

i 

with H Q [n] = Q T H[n]Q. Using a DFT/Kohn-Sham 
model, Hq accounts for the projection of the non-linear 
Hartree and exchange-correlation terms onto the work- 
ing search subspace Q. One of the primary difference be- 
tween linear FEAST and the proposed non-linear FEAST 
algorithm, named NLFEAST, then lies in Step-3 with the 
formation and solution of the reduced non-linear eigen- 
vector problem. This latter, in turn, can be solved us- 
ing the traditional SCF procedure presented in Figure [1] 
Since the non-linearity appears at the level of the reduced 
system alone, more robust non-linear schemes that would 
have been too expensive to use on the large size original 
system ©, can potentially be considered for addressing 
the reduced system © (e.g. specific Newton-Raphson 
method^). 

As in the linear case, once the solution of the reduced 
system is obtained, the subspace is updated and the 
FEAST algorithm can come back on track and perform 
subspace iterations up until convergence. As it will be 
demonstrated in our experimental results in Section llVl 
however, the algorithm essentially converges faster when 
the size of the subspace is increased with each FEAST 
subspace iteration. These results suggest that the size 
of the search subspace should not be made static for the 
non-linear problem. This then constitutes a major and 
innovative difference with the original FEAST algorithm 
that aims at improving the robustness of the subspace 
iteration technique. The Raleigh-Ritz procedure (i.e. 
Step-3) then uses a new search subspace Q which consists 
not only of the subspace most recently produced by so- 
lution of the contour integration at step k, i.e. Q^ fc ', but 
also of all (or some) of the subspaces generated in previ- 
ous iterations such that Q = {Q {k \ Q^ 1 ), Q (fc ~ 2) , . . . }. 
The subspace provided by the contour integration can 
be appended to Q indefinitely until convergence, or this 
can be done such that only a finite number (typically 3 
or 4) of the most recently generated subspaces are re- 
tained. Interestingly, this procedure is reminiscent of the 
one used in DIIS where new electron densities are con- 
structed from a subspace consisting of previous generated 
electron densities and density residuals. However, tradi- 
tional mixing schemes act on the electron density alone, 
and it is not always guaranteed that the density can be 
expressed as a linear combination of previously generated 
densities. In contrast, the proposed approach is expected 
to be much more robust by essence since a larger and re- 
fined search subspace containing all the eigenvectors of 
interests, is very likely to span the solution of the non- 
linear problem. 

The resulting NLFEAST algorithm is given in Figure 
131 We note that a singular value decomposition (SVD) 
also needs to be performed on the search subspace Q 
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Non-linear eigenvector FEAST algorithm 

Input: interval [E m i n , E max ] including M lowest occupied 
states and an initial search subspace of size Mo > M 

1- Initialization 

a-Define an initial guess for the electron density n 

b-Construct Hamiltonian H[n] 

c-Select M Q > M random vectors Y G R NxM " 

2- Contour Integration at FEAST iteration k 

Compute QW Mo = ~ J^dZ (ZS - H[n]) -1 Y JVXM() , 

3- Non-linear Rayleigh-Ritz 

a-Construct the subspace Q = {Q (fc) , Q (fc_1) , Q (fc_2) , . . . } 
b-Compute Q := U using SVD i.e. Q = UEV T 
c- SCF Procedure 

• construct Hamiltonian H[n] 

(requires also solving the Poisson equation) 

• foml Hq MoXMo = Q T H[n]Q and Sq MoXM(j = Q T SQ 

• solve reduced eigenvalue problem Hq$ = eSq$ 

M 

• compute electron density n = 2 |Q<J? m | 2 

m=l 

• Check convergence of n, go back to step-c if needed 
with a new generated input density 

4- Subspace Iteration 

Set E m = e m and compute X NXMq = Q NXMo $ MoXMo 

Check convergence (i.e. Trace J2 m E ™ G [Emin, E max ])- 
If needed go back to step 2 with Y = SX 
Output: All the M < Mo eigenpairs ({i? m ,x m }). 



FIG. 3. FEAST procedure for solving the non-linear eigenvec- 
tor problem H[n]x = £Sx of size N with H real symmetric 
and S symmetric positive definite (spd), and obtaining all the 
M lowest occupied states within a given interval [i? m i n , -B ma x] 
and hence the ground state electron density. Here, Step 3-c 
of the algorithm is making use of a traditional SCF procedure 
for solving the non-linear reduced system. We note that in 
Step-3a, the column vectors of the newly generated subspace 
are appended to the matrix of the old subspace, increasing the 
total subspace size by Mo- This can be done such that only 
a finite number of the most recently generated subspaces are 
retained, or the subspace size can be increased until conver- 
gence. Additionally, a singular value decomposition (SVD) 
is performed on Q (i.e. Step 3-b), such that only the left 
singular vectors U are used in the Rayleigh-Ritz procedure. 



(i.e. Step 3-b), and only the left-singular vectors are re- 
tained. In contrast to the linear FEAST algorithm, one 
does not want to truncate the size of the search subspace 
for constructing a positive definite reduced matrix Sq. 
This latter is by definition rank deficient and SVD can- 
not then be avoided. In our experiments thus far this 
step has proven to be computationally inexpensive rela- 



tive to other parts of the algorithm, and even so, it can 
be shown that the SVD of Q can be updated at each 
successive FEAST iteration with complexity similar to 
solving a single reduced eigenvalue problem in Step-3c 20 . 
Finally, like other schemes for solving nonlinear eigen- 
vector equations, NLFEAST requires an initial guess for 
the density n (Step-la of Figure[3]). However, the quality 
of the initial guess is not important as far as achieving 
convergence is concerned, as it will be shown in Section 

El 

In summary, NLFEAST can be seen as an inversion 
of the usual process of solving the nonlinear eigenvector 
problem ([2]). Using SCF-DIIS, one iteration corresponds 
to a traditional SCF iteration as presented in Figure Q] 
which requires, in particular, solving a large linear eigen- 
value problem as well as the calculation of the Hartree 
potential by solving Poisson's equation. This scheme 
presents then two iterative procedures: one outer associ- 
ated with the SCF, and one inner due to the eigenvalue 
solver (using a "black-box" eigenvalue solver, those inner 
iterations -intrinsic to eigenvalue algorithms- are most of- 
ten hidden to the user). As discussed earlier, this linear 
eigenvalue problem can ideally be solved using the linear 
FEAST algorithm (in Figure [2J which converges in very 
few iterations of contour integration (typically ~ 3). Us- 
ing NLFEAST, however, those two iterative procedures 
happen now in reverse order. The outer iterations are 
intrinsic to the FEAST algorithm (i.e. subspace itera- 
tions), while the inner iterations are used for solving the 
non-linear reduced system (J3|). In addition to offering a 
more robust mathematical approach, NLFEAST is ex- 
pected to considerably reduce the eigenvalue solve time 
since: (i) the reduced non-linear problem ([3]) is orders 
of magnitude smaller in size as compared to the original 
one ([2]), and (ii) the reduced problem does not need to 
be solved very accurately to guarantee the convergence 
of the outer-iterations. 



IV. NUMERICAL EXPERIMENTS AND CAPABILITIES 

In this section, we propose to demonstrate the nu- 
merical efficiency of the proposed NLFEAST algorithm 
for the non-linear eigenvector problem @. DFT/Kohn- 
Sham/LDA calculations have been performed on vari- 
ous molecules using our in-house all-electron simulation 
framework that uses a real-space cubic finite element dis- 
cretization. Some details of our modeling and real-space 
discretization setup have been provided in Ref. [H The 
Pulay-DIIS technique has been used for performing the 
traditional SCF approach defined in Figure Q] as well 
as for solving the FEAST non-linear reduced system in 
Step-3c of Figure [3] Both DIIS procedures make use of a 
subspace composed of five successive generations of the 
electron densities, but the maximum number of DIIS it- 
erations for solving the reduced system in NLFEAST has 
been fixed to three (i.e. the non-linear reduced system is 
then solved only approximately) . For most of the exam- 
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pies, indeed, increasing the number of the inner iterations 
further has had no effect on the overall NLFEAST con- 
vergence rate. 



A. Performance comparisons 

In all our experiments conducted thus far, including 
various molecules from H2 to Cgo, the NLFEAST al- 
gorithm has outperformed SCF-DIIS both in terms of 
convergence rate and execution time. Three represen- 
tative examples are shown in Figure |4] for the Silanc, 
Benzene and Caffeine molecules. The relative error on 
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FIG. 4. Results of numerical experiments comparing the per- 
formance of our algorithm to that of DIIS for the Silane SiHU, 
Benzene Ce^e and caffeine C8H10N4O2 molecules. The rel- 
ative error on the total energy is used here as the measure 
of convergence. The convergence criteria for NLFEAST has 
been set to 10 -10 while DIIS-SCF was stopped after a max- 
imum of 10 iterations for Silane and 15 iterations for both 
Benzene and Caffeine. The meaning of outer-iterations is dif- 
ferent for both algorithms. 

the total energy at each iteration is used as the mea- 
sure of convergence for both approaches, as this is the 
most directly comparable measure between DIIS-SCF 
and NLFEAST. As discussed in Section [TTT1 the mean- 
ing of the outer-iterations is different for both algorithm 
since, for NLFEAST, it directly represents the number 
of contour integrations and non-linear reduced systems. 

Since DIIS-SCF also takes advantage of the linear 
FEAST algorithm, it is possible to have a direct compar- 
ison between the number of operations counts of DIIS- 
SCF and NLFEAST. In both cases the contour integra- 
tion is the most time consuming single numerical oper- 
ation. In the three examples provided in Figure 2J for 
example, SCF-DIIS requires 5x to 7x more contour in- 
tegrations than NLFEAST for obtaining the same level 
of accuracy (i.e. ~ 1CT 7 ). On the other hand, NLFEAST 
requires the solution of ~ 3 x as many Poisson equations 
as does SCF-DIIS. However, since the Poisson system 



matrix only needs to be factorized once at the beginning 
of the calculation, the remaining Poisson solve operations 
become negligible in comparison to both factorizing and 
solving the complex linear systems that arise from the 
contour integration. The rest of the numerical opera- 
tions involved in NLFEAST, such as solving the reduced 
eigenvalue problem or performing the SVD in Step-3c 
of Figure [31 bring no significant computational overhead 
provided the ratio N/M stays relatively large for a given 
search interval. In practice, it is also reasonable to use 
an original subspace size M that is typically no larger 
than ~ 500, if we assume that the Q subspace keeps 
increasing indefinitely and that convergence is reached 
in less than 10 iterations. Since it is recommended for 
FEAST to use Mq ~ 1.5 x the number of eigenstatesii, 
the basic NLFEAST scheme is expected to provide good 
performances up to ~ 350 states (i.e. 700 electrons with 
a factor 2 for spin) . It should be noted that the case of a 
smaller N/Mq ratio with a relatively large Mo, can lead 
to degradation of performances and memory limitations 
only if the algorithm runs sequentially. The FEAST al- 
gorithm can indeed be readily parallelized and the appli- 
cability of NLFEAST scheme for addressing large-scale 
molecular systems will be discussed in Section fVl 

TableUsummarizes the convergence results we have ob- 
tained using NLFEAST for a sample of molecules ranging 
in size from H2 to Cgo, with results on the total energy 
which are compared with the NWChem software^. Al- 
though we did no attempt to optimize or refine our cubic 
finite element discretization, the total energy results are 
found to be in good agreement with NWChem which is 
making use of completely different numerical approaches 
and convergence criteria (for these reasons also, a direct 
and useful comparison of the convergence rate cannot be 
achieved here). It is also important to note that in the 
case of the non- linear eigenvector problem ^ , the rela- 
tive error on the total energy is not a natural criteria for 
measuring the convergence. For NLFEAST, the criteria 
is instead defined in terms of the error on the trace of the 
eigenvalues. From our numerical experiments, once this 
convergence criteria is satisfied, it also provides very low 
non-linear residual. For the simulations reported in Table 
U we have also used: (i) a Q search subspace that keeps 
increasing indefinitely until convergence, (ii) 16 nodes for 
the Gauss quadrature along the complex contour, (iii) a 
conventional initial guess as the starting point for the 
electron density (e.g. the all-electron result for the single 
atoms). In the following subsections, the robustness of 
NLFEAST will be discussed and demonstrated further 
by separately addressing each one of those points. 



B. Convergence Rate and Q subspace size 

The size of the search subspace Q (in step-3a of Fig- 
ure [3]) may be limited by only retaining a finite num- 
ber of the most recent subspaces, or it may be extended 
indefinitely until convergence. For the numerical exper- 
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^electrons ^iterations Etot (eV) NWChem 



H 2 


2 


5 


-30.962 


-30.959 


CH 4 


10 


5 


-1091.45 


-1091.69 


H 2 


10 


7 


-2065.24 


-2065.48 


CO 


14 


6 


-3060.12 


-3060.45 


SiH 4 


18 


9 


-7907.42 


-7909.78 


Na 2 


22 


8 


-8785.57 


-8786.61 


CeHg 


42 


6 


-6262.31 


-6263.65 


C 8 HioN 4 2 


102 


9 


-18364.5 


-18360.3 


Ceo 


360 


8 


-61610.5 


-61660.1 



TABLE I. Convergence rate results for NLFEAST for vari- 
ous molecules (the molecular geometries are obtained from 
the experimental data^). The convergence criteria is sat- 
isfied for NLFEAST when the relative error on trace is be- 
low 10 -10 , which also provides very low non-linear relative 
residual max m (||H(x m )x m - E m Sx m ||/||H(x m )x m [|), typi- 
cally here below 10~ 8 . The total energy results using our 
cubic real-space FEM code are in good agreement with those 
obtained by NWChem ?? using the cc — pvqz basis from H 2 to 
CeHe, and the 6 — 31g* basis for CsHioN40 2 and Ceo- 

iments presented in Figure 0] and Table U the rate of 
convergence is expected to reach a maximum because all 
generated subspaces have been retained. Figure [5] shows 
the results of several numerical experiments that were 
performed for our selected three molecules using differ- 
ent sizes for the search subspace Q. From these results, 



subspaces. By retaining a limited number of subspaces, 
it becomes also possible to increase the size of Mo to 
consider a larger number of electrons for a given search 
interval without resorting yet to explicit parallelism. In 
the case of Benzene and Caffeine, one also notes that the 
algorithm does not appear to converge at all when only 
a single Q 1 -^ subspace is used, this subspace being the 
one that was most recently generated. Although this is 
not necessarily typical and the solution may eventually 
converge using a more efficient approach for solving the 
non-linear reduced problem (e.g. using a larger number 
of inner iterations for the FEAST-DIIS problem which 
has been fixed to three in our simulations), it highlights 
the importance of extending the search subspace size for 
the success of this algorithm. 



C. Convergence Rate and Contour Integration Accuracy 

At each iteration of NLFEAST, the approximate sub- 
space solution is improved through multiplication by the 
density matrix of the most current Hamiltonian (see 
equation [T]) . This step, Step 2 in Figure [3l is accom- 
plished by performing a numerical contour integration 
of the Green's function multiplied by the approximate 
solution. In practice, a n-point Gauss quadrature can 
efficiently be used here, which involves summing the so- 
lutions of n separate linear systems. For all the results 
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FIG. 5. Numerical experiments demonstrating the role of the 
number of most recent retained subspaces for construct- 
ing the search subspace Q = {Q (,c) , Q (fc_1) , Q (fc ~ 2) , . . . } used 
by NLFEAST. The convergence rate depends on this num- 
ber that here varies between one and "infinity" (i.e. where 
the search subspace keeps increasing until convergence is 
reached) . 

one observe that the rate of convergence of NLFEAST 
is directly related to the number of subspaces that are 
retained, but a high rate of convergence can still be ob- 
tained by retaining only three to four of the most recent 



FIG. 6. Experiments demonstrating the effect on convergence 
of the accuracy of the contour integration in NLFEAST. Dif- 
ferent curves represent the convergence for different numbers 
of Gauss points used in the numerical contour integration 
(Step 2 in Figure [3]). For all these results, the search sub- 
space Q keeps increasing indefinitely until convergence. 

presented so far, 16 Gauss contour points were used to 
perform the quadrature. Figure [6] presents a new set 
of convergence results for our three selected molecules, 
while considering the variation of the number of Gauss 
points from as low as 4 to as high as 48. Although increas- 
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ing the number of Gauss points can help to improve the 
convergence rate, the effect here is not quite as dramatic 
as the effect of increasing the subspace size as discussed 
previously. We note here very similar convergence be- 
haviors between NLFEAST and the FEAST algorithm 
for the linear problem. This latter admits a mathemati- 
cal convergence proof that shows that the actual conver- 
gence rate depends on both the accuracy of the contour 
integration and the size of the search subspace^. 



D. Robustness and initial guess 

Like other schemes for solving nonlinear eigenvector 
equations, NLFEAST requires an initial guess for the 
density n. Clearly, a good initial guess can provide faster 
convergence, but for NLFEAST the quality of the initial 
guess is not important as far as achieving convergence is 
concerned. Unlike other means of performing SCF itera- 
tions, our algorithm is capable of achieving convergence 
even when given an extremely poor initial guess includ- 
ing the extreme case of no initial guess at all (i.e. n = 0). 
Figure [7] shows the results of a number of numerical ex- 
periments wherein the initial guess for the electron den- 
sity was set to zero. Each experiment eventually resulted 
in convergence. The algorithm's performance when given 
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FIG. 7. Results of numerical experiments where the initial 
guess for the electron density was set to zero, n = 0. The 
maximum number of subspace to form Q has been set equal 
to 10 for all molecules but C8H10N4O2 and C60 where it was 
set equal to 4. The convergence is reached for an error on the 
trace smaller than 10 -9 

such a poor initial guess is partly related to the size of 
the system; more electrons typically means that a larger 
number of iterations is required before convergence is 
reached. Performance also appears to depend on the par- 
ticular molecule under consideration. In Figure CH 4 
requires a larger number of iterations to reach a high level 
of convergence, despite being the second smallest system 



(in terms of number of electrons). It is likely that, for 
some molecules, the algorithm converges towards a local 
energy minimum before ultimately finding its way to the 
global minimum, which results in its progress being more 
delayed than it would be with a molecule where such a 
detour does not occur. The purpose of Figure [7] is just 
to illustrate the robustness of the algorithm by consid- 
ering an extreme (academic) case. As shown in Table HI 
a conventional initial guess has, so far, guaranteed low 
residual convergence in less than 10 FEAST iterations 
for all the molecules that we have experimented with. 



V. CONCLUSION AND FUTURE WORK 

A new eigensolver-based strategy, named NLFEAST, 
is derived from a generalization of the FEAST eigenvalue 
algorithm for solving the non-linear eigenvector prob- 
lem i.e. H({ip})ip — Eip such as the one arising from 
the DFT/Kohn-Sham electronic structure model. By 
providing a fundamental and practical numerical solu- 
tion for addressing the non-linearity with the occupied 
eigenvectors, NLFEAST offers a very efficient and ro- 
bust alternative to the traditional self-consistent proce- 
dure using density- mixing schemes such as Pulay-DIIS. 
Several numerical experiments using the DFT/Kohn- 
Sham/LDA model, have demonstrated the significant po- 
tential of NLFEAST to outperform the traditional SCF 
mixing techniques in terms of convergence rate, robust- 
ness, and numerical operation counts. Strictly speaking, 
NLFEAST should not be considered as a direct com- 
petitor to current methods and it can rather be seen as 
an alternative formulation on how to handle the non- 
linearity. Indeed, the resulting non-linear reduced system 
can, in turn, be addressed using any SCF procedures. In 
our simulations, the reduced system did not need also 
to be solved accurately since the overall convergence of 
our scheme benefits from the robustness of the FEAST 
subspace iterations (which also limits the number of inner 
SCF iterations). This feature could potentially be further 
exploited while considering Hartree-Fock or DFT hybrid 
models where the construction of the Hamiltonian system 
is more involved than solving a single Poisson equation. 

We note that a practical implementation of the tech- 
nique can be achieved effectively using the FEAST solver 
packag o 24 ' 25 . As such, the migration of electronic struc- 
ture codes making use of SCF mixing schemes, could pro- 
ceed within two steps: (i) integration of the FEAST pack- 
age as the main linear eigenvalue solver, (ii) reverse-order 
the SCF-process as it was stated in Section UTTT Since the 
FEAST solver is reverse communication ready, it pro- 
vides the flexibility to build a customized NLFEAST with 
specific computational modules for the problem at hand 
(independently of the physical and discretization mod- 
els). 

In this paper, the applicability of NLFEAST has also 
been demonstrated for a given search interval which can 
include several hundred states. In order to go beyond 
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the current performance capabilities of NLFEAST for ad- 
dressing much larger molecular systems, including nanos- 
tructures of current technological interest, explicit paral- 
lelism will become necessary. The parallel treatment of 
NLFEAST should directly benefit from the intrinsic par- 
allel capability of the FEAST algorithm (including the 
three level of parallelism mentioned in Section [TTJ) . For 
the all-electron model, in particular, the algorithm can 
act on different energy ranges (with no overlap for the 
contour integrations), in order to capture core or valence 
electrons independently. For example, the results in Fig- 
ure |S] clearly illustrate the various core and valence re- 
gions within the energy spectrum distribution for a given 
molecular configuration. 
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FIG. 8. Plots of the density of states for buckminsterfullerene 
(Ceo) and caffeine (C8H10N4O2). Our all-electron code is able 
to capture both the valence states and the core states of each 
type of atom. From those results, we can clearly identify the 
low energy core regions of Oxygen, Nitrogen, and Carbon. 

With many search intervals present in the simulation, 
however, one would also need to address the parallel scal- 
ability for solving the non-linear reduced system. Using 
the linear FEAST within a traditional SCF process, there 
is no issue, since the number of reduced systems is equal 
to the number of intervals. Using NLFEAST, however, 
the situation becomes less straightforward since the con- 
struction of the non-linear reduced system depends (in 
principle) on the solutions of the contour integrations for 



all search intervals. The development of new direct or 
iterative schemes for solving the resulting reduced eigen- 
value problem that scale linearly with the number of 
search intervals will need to be addressed in future re- 
search. 
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